Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the n_protocol_dag_results property to ProtocolResult #381

Conversation

ianmkenney
Copy link
Member

This PR adds the n_protocol_dag_results property to the ProtocolResult class. It reflects the number of ProtocolDAGResults that were input to the _gather call.

The `gather` aggregation method removes too much information when
reducing the provided protocol_dag_results. This change adds the new
n_protocol_dag_results property that reflects the number of
ProtocolDAGResult objects that were used to create the ProtocolResult.
Copy link

codecov bot commented Oct 30, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.67%. Comparing base (10fef9b) to head (6cdbb72).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #381   +/-   ##
=======================================
  Coverage   98.66%   98.67%           
=======================================
  Files          36       36           
  Lines        2097     2109   +12     
=======================================
+ Hits         2069     2081   +12     
  Misses         28       28           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ianmkenney
Copy link
Member Author

I'm probably being a little (very) pedantic here, but since gather takes in an Iterable, which can include non-terminating generators, it's possible for gather to hang forever (before running out of memory and exiting).

Since I'm using the len function in this PR, even well-behaved generators will no longer work. An alternative would be to use list(protocol_dag_results), but this is still susceptible to the infinite generator case. I'm not sure what the best option for typing is, since there doesn't appear to be a good builtin that enforces a terminating iterable and I think defining and comparing to a custom type might be more extreme than just not allowing generators.

All of that said, I'm happy to revert the guardrail (and its test) if it seems like too much for this PR. Although I suspect type checkers will complain if I use len rather than list.

@ianmkenney ianmkenney requested a review from dotsdl October 30, 2024 23:14
@dotsdl
Copy link
Member

dotsdl commented Oct 31, 2024

I think the case of infinite ProtocolDAGResults isn't remotely likely, but I see your point. We could slate changing the type hint for gather from Iterable to list for gufe 2.0, indicating that we explicitly expect a list (and therefore a finite number) of ProtocolDAGResults fed to gather.

Do you agree with this?

@dotsdl
Copy link
Member

dotsdl commented Oct 31, 2024

You will also need to make changes to the _to_dict and _from_dict methods in ProtocolResult reflecting these changes.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also blocking on downstream testing and maybe a bit of a discussion on protocol needs.

self._data = data
self._n_protocol_dag_results = (
n_protocol_dag_results if n_protocol_dag_results is not None else 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to wrap my head around the use case here, could you explain the case where None happens? I.e. at first glance setting None to zero seems incorrect if None means that the gather method did not pass through a value. In that case None means undefined and it's much cleaner than getting 0 with the idea that you got no results (but you do).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the context of a Protocol's gather, you'll never see a None passed in. I can see instances where ProtocolResults are used directly, incorrectly or not, and a None would make sense. But of course I immediately throw away the None in the code you highlighted above. I suppose there is still some indecision on my part what the right signature should be and which values the ProtocolResult holds on to. I'm tempted to just drop Optional[int] and use int with default 0.

Just from the tests, I see instances where someone would make a ProtocolResult directly and I'll probably do something similar in testing parts of stratocaster downstream:

class DummyProtocolResult(gufe.ProtocolResult):
    def get_estimate(self):
        return self.data['estimate']

    def get_uncertainty(self):
        return self.data['uncertainty']


class TestProtocolResult(GufeTokenizableTestsMixin):
    cls = DummyProtocolResult
    key = 'DummyProtocolResult-b7b854b39c1e37feabec58b4000680a0'
    repr = f'<{key}>'

    @pytest.fixture
    def instance(self):
        return DummyProtocolResult(
            estimate=4.2 * unit.kilojoule_per_mole,
            uncertainty=0.2 * unit.kilojoule_per_mole,
        )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ianmkenney - is this a time critical thing? I would like to spend a bit of time to check what this means for the existing Protocols, but I don't think I'll have time until Monday :(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily. I'll be testing stratocaster strategies based on this branch until it's merged in the hopes that we get some solution that's close in spirit to what we have here.

@ianmkenney
Copy link
Member Author

I think the case of infinite ProtocolDAGResults isn't remotely likely, but I see your point. We could slate changing the type hint for gather from Iterable to list for gufe 2.0, indicating that we explicitly expect a list (and therefore a finite number) of ProtocolDAGResults fed to gather.

Do you agree with this?

Yeah that works, I think in the long term it will simplify typing issues.

IAlibay and others added 4 commits October 31, 2024 16:22
`_to_dict` and `_from_dict` of `ProtocolResult` were also updated to
account for the new n_protocol_dag_results property
Explicitly test `get_uncertainty`, `get_estimate`, and `n_protocol_dag_results`
@dotsdl dotsdl added this to the Release 1.2 milestone Nov 12, 2024
@dotsdl
Copy link
Member

dotsdl commented Nov 12, 2024

@IAlibay downstream do you serialize ProtocolResult objects, or ProtocolDAGResult objects? The former will be impacted by this change, but the latter will not.

@IAlibay
Copy link
Member

IAlibay commented Nov 18, 2024

@IAlibay downstream do you serialize ProtocolResult objects, or ProtocolDAGResult objects? The former will be impacted by this change, but the latter will not.

We serialize ProtocolResult objects - since this is facing a ton of users, we'd want to be able to deserialize any 1.x ProtocolResults - is that possible? I.e. could we add a try/except in from_dict?

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this?

gufe/protocols/protocol.py Outdated Show resolved Hide resolved
ianmkenney and others added 2 commits November 20, 2024 20:38
First attempt to extract n_protocol_dag_results from the data dict. If
a KeyError is raised, as you'd find for ProtocolResults serialized
prior to this commit, set n_protocol_dag_results to the default value
of 0.

Co-authored-by: Irfan Alibay <[email protected]>
Test creating a ProtocolResult from a dictionary that is missing the
n_protocol_dag_results key.
@ianmkenney ianmkenney force-pushed the feature/protocolresult_n_protocol_dag_results branch from 93adf94 to 8d7f3b7 Compare November 21, 2024 04:40
@ianmkenney ianmkenney requested a review from IAlibay November 21, 2024 15:44
Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks!

@dotsdl
Copy link
Member

dotsdl commented Nov 26, 2024

@ianmkenney if you can resolve conflicts, I think this one is good to go for merge!

@ianmkenney ianmkenney merged commit a9b5982 into OpenFreeEnergy:main Nov 26, 2024
18 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants